ORIGEN DE LES DADES

PROJECTE FIS UB-HSJD

https://drive.google.com/drive/folders/16xYtE8mjm84OUzRUlYwuQskiSA1fj0Lo?usp=sharing

A dins de la carpeta Function, teniu: • maintable.tsv. En aquest fitxer teniu les proteĆÆnes que s’han trobat, la seva descripció, les rutes metabòliques i les Gene Ontologies (GO) corresponents. TambĆ© una columna per cada mostra amb el nĆŗmero de reads identificat per cada proteĆÆna. • Carpeta diff_analysis. En aquesta carpeta trobareu taules amb els resultats de l’anĆ lisi comparatiu (proteĆÆnes mĆ©s abundants entre condicions experimentals), taules amb les tĆ­pics grĆ fics (volcano plots, etc.), els resultats d’anĆ lisi d’enriquiment de GO i de rutes metabòliques. Trobareu una estructura de carpetes tipus:

fecal_Control__fecal_CU

Això vol dir que s’han comparat les condicions fecal_Control i fecal_CU. Dins d’aquesta carpeta trobareu fitxers tipus:

3a8c19f4d3c528_fecal_Control.xlsx 3a8c19f4d3c528_fecal_CU.xlsx

La referĆØncia que s’ha utilitzat Ć©s la que es mostra al nom del fitxer. Ɖs a dir, en el fitxer 3a8c19f4d3c528_fecal_Control.xlsx, la referĆØncia Ć©s fecal_Control.

Los resultados del anĆ”lisis funcional obtenido por Sequentia Biotech se han basado en el anĆ”lisis: Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes. NOTA: Els resultats s’ han filtrat pel nivell de significació (FDR) p<0.001 i es parteix d’ una taula filtrada.

#INSTALLATION OF ALL THE PACKAGES

# List of packages to install
packages <- c("miscset", "ggplot2", "ggpubr", "dplyr", "httr", "tidyverse")

# Install packages if not already installed
install_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if (length(install_packages) > 0) {
  install.packages(install_packages, dependencies = TRUE)
}

# Load the installed packages
library(miscset)
library(ggplot2)
library(ggpubr)
library(dplyr)
library(httr)
library(tidyverse)

ANƀLISI DELS RESULTATS DE KEGG

DESCRIPCIO DE KEGG

Se ha realizado el anàlisis diferencial (enrichment analysis) segun ontologies de KEGG y GO. Los resultados se exponent a continuación.

ANƀLISI ESTADISTICA DEL KEGG

Se realiza un filtro sólo para los grupos: ā€œfecal.Control.control_vs_fecal.CU.3.mesesā€ ā€œfecal.Control.control_vs_fecal.CU.6.mesesā€ ā€œfecal.Control.control_vs_fecal.CU.debutā€ ā€œfecal.Control.control_vs_fecal.EC.6.mesesā€ ā€œfecal.CU.6.meses_vs_fecal.CU.debutā€ ā€œfecal.EC.3.meses_vs_fecal.Control.controlā€ ā€œfecal.EC.3.meses_vs_fecal.EC.6.mesesā€ ā€œfecal.EC.3.meses_vs_fecal.EC.debutā€ ā€œfecal.EC.debut_vs_fecal.EC.6.mesesā€

#####################REVISAR P-AVLOR < 0.001 Y UP-DOWN DE DONDE HAN SALIDO 27-9-2023###################################
####################################################

#################################################
# STATISTICAL ANALYSIS
################################################
#UPLOAD UNA VEZ SE HA OBTENIDO EL FICHERO
#setwd("D:/dropbox2023/Dropbox/otros analisis estadisticos 2015/HSJD 2017 javier martin de carpi SEQUENTIA/estudio_HSJD_2019_2020_fis_bioinformatica/2019_2020/analisis_funcional_2022_andreuPay")
getwd()
## [1] "/Users/alex/Downloads/analisis_functional"
FILE.KEGG.HSJD <- read.csv("/Users/alex/Downloads/analisis_functional/dataset_kegg.csv") #dades filtrades

#table(FILE.KEGG.HSJD$GROUP) ##CONTAR LOS GRUPOS --> 54

#LOS GRUPOS QUE SALEN SON:

#fecal.Control.control_vs_fecal.CU.3.meses     fecal.Control.control_vs_fecal.CU.6.meses       fecal.Control.control_vs_fecal.CU.debut 
#24                                            19                                            15 
#fecal.Control.control_vs_fecal.EC.6.meses      fecal.Control.control_vs_oral.EC.3.meses      fecal.Control.control_vs_oral.EC.6.meses 
##21                                            23                                            22 
#fecal.Control.control_vs_oral.EC.debut          fecal.CU.3.meses_vs_fecal.EC.6.meses             fecal.CU.3.meses_vs_oral.CU.debut 
#31                                            19                                            21 
#fecal.CU.3.meses_vs_oral.EC.6.meses             fecal.CU.3.meses_vs_oral.EC.debut            fecal.CU.6.meses_vs_fecal.CU.debut 
#21                                            25                                            11 
#fecal.CU.6.meses_vs_fecal.EC.6.meses            fecal.CU.6.meses_vs_fecal.EC.debut             fecal.CU.6.meses_vs_oral.CU.debut 
#18                                            13                                            19 
#fecal.CU.6.meses_vs_oral.EC.3.meses           fecal.CU.6.meses_vs_oral.EC.6.meses             fecal.CU.6.meses_vs_oral.EC.debut 
#24                                            21                                            24 
#fecal.CU.debut_vs_oral.EC.debut     fecal.EC.3.meses_vs_fecal.Control.control          fecal.EC.3.meses_vs_fecal.CU.3.meses 
#19                                            19                                            21 
#fecal.EC.3.meses_vs_fecal.CU.6.meses            fecal.EC.3.meses_vs_fecal.CU.debut          fecal.EC.3.meses_vs_fecal.EC.6.meses 
#16                                            15                                            18 
#fecal.EC.3.meses_vs_fecal.EC.debut             fecal.EC.3.meses_vs_oral.CU.debut           fecal.EC.3.meses_vs_oral.EC.3.meses 
#20                                            21                                            20 
#fecal.EC.3.meses_vs_oral.EC.6.meses             fecal.EC.3.meses_vs_oral.EC.debut             fecal.EC.6.meses_vs_oral.CU.debut 
#20                                            23                                            23 
#fecal.EC.6.meses_vs_oral.EC.6.meses             fecal.EC.6.meses_vs_oral.EC.debut            fecal.EC.debut_vs_fecal.CU.3.meses 
#22                                            28                                            24 
#fecal.EC.debut_vs_fecal.EC.6.meses               fecal.EC.debut_vs_oral.CU.debut             fecal.EC.debut_vs_oral.EC.3.meses 
#18                                            23                                            22 
#fecal.EC.debut_vs_oral.EC.6.meses               fecal.EC.debut_vs_oral.EC.debut oral.Control.control_vs_fecal.Control.control 
#21                                            32                                            29 
#oral.Control.control_vs_fecal.CU.3.meses      oral.Control.control_vs_fecal.CU.6.meses        oral.Control.control_vs_fecal.CU.debut 
#25                                            27                                            19 
#oral.Control.control_vs_fecal.EC.3.meses      oral.Control.control_vs_fecal.EC.6.meses        oral.Control.control_vs_fecal.EC.debut 
#22                                            26                                            25 
#oral.CU.3.meses_vs_fecal.CU.debut           oral.CU.6.meses_vs_fecal.CU.3.meses           oral.CU.6.meses_vs_fecal.CU.6.meses 
#16                                            19                                            21 
#oral.CU.6.meses_vs_fecal.EC.6.meses             oral.CU.6.meses_vs_fecal.EC.debut               oral.CU.debut_vs_fecal.CU.debut 
#20                                            20                                            16 
#oral.EC.3.meses_vs_fecal.CU.3.meses             oral.EC.3.meses_vs_fecal.CU.debut           oral.EC.3.meses_vs_fecal.EC.6.meses 
#22                                            22      


#UNICOS GRUPOS VALIDOS SON: filtrar les dades
#"fecal.Control.control_vs_fecal.CU.3.meses"
#"fecal.Control.control_vs_fecal.CU.6.meses" 
#"fecal.Control.control_vs_fecal.CU.debut"
#"fecal.Control.control_vs_fecal.EC.6.meses" 
#"fecal.CU.6.meses_vs_fecal.CU.debut" 
#"fecal.EC.3.meses_vs_fecal.Control.control" 
#"fecal.EC.3.meses_vs_fecal.EC.6.meses" 
#"fecal.EC.3.meses_vs_fecal.EC.debut" 
#"fecal.EC.debut_vs_fecal.EC.6.meses" 

FILE.KEGG.HSJD.sel<- FILE.KEGG.HSJD[(FILE.KEGG.HSJD$GROUP=="fecal.Control.control_vs_fecal.CU.3.meses" |
                                    FILE.KEGG.HSJD$GROUP=="fecal.Control.control_vs_fecal.CU.6.meses" | 
                                    FILE.KEGG.HSJD$GROUP=="fecal.Control.control_vs_fecal.CU.debut" |
                                    FILE.KEGG.HSJD$GROUP=="fecal.Control.control_vs_fecal.EC.6.meses" |
                                    FILE.KEGG.HSJD$GROUP=="fecal.CU.6.meses_vs_fecal.CU.debut" |
                                    FILE.KEGG.HSJD$GROUP=="fecal.EC.3.meses_vs_fecal.Control.control" |
                                    FILE.KEGG.HSJD$GROUP=="fecal.EC.3.meses_vs_fecal.EC.6.meses" |
                                    FILE.KEGG.HSJD$GROUP=="fecal.EC.3.meses_vs_fecal.EC.debut" |
                                    FILE.KEGG.HSJD$GROUP=="fecal.EC.debut_vs_fecal.EC.6.meses") ,]
#print("NĆŗmero de resultados por grupo experimental")                                      
#table(FILE.KEGG.HSJD.sel$GROUP) ##CONTAR LOS GRUPOS --> 165



#solo para ver un grupo por ejemplo control vs FECAL-CU
#FILE.KEGG.HSJD.sel1<- FILE.KEGG.HSJD.sel[(FILE.KEGG.HSJD.sel$GROUP=="fecal.Control.control_vs_fecal.CU.debut" ) ,]
#table(FILE.KEGG.HSJD.sel1$ONT_DESCRIPTION) ##CONTAR LOS GRUPOS --> 15



#volcano plot de todo el metabolismo KEGG de todos los grupos


#crear un df para el volcano plot
#vp<-FILE.KEGG.HSJD.sel1
#calcular la variable log2 foldchange (log base 2)
#vp$log2FoldChange<- log(vp$EA_VALUE)
#library(ggplot2)
#grafico de tipo volcano plot
#ggplot(data=vp, aes(x=log2FoldChange, y=-log10(FDR), col=KEGG_up_DOWN, label=ONT_DESCRIPTION)) + 
#  geom_point() + 
#  theme_minimal() +
#  geom_text()
#no sale nada

Es presenten tots els resultats

#representar TODAS LAS TABLAS DE RESULTADOS tablas

ggplotGrid(ncol = 2,
           lapply(c("ONT_DESCRIPTION", "sample",          "GROUP",          "KEGG_up_DOWN"),
                  function(col) {
                    ggplot(FILE.KEGG.HSJD.sel, aes_string(col)) + geom_bar() + coord_flip()
                  }))

ggplotGrid(ncol = 2,
           lapply(c("ONT_DESCRIPTION"),
                  function(col) {
                    ggplot(FILE.KEGG.HSJD.sel, aes_string(col)) + geom_bar() + coord_flip()
                  }))

Descriptiva de los grupos de CU

#para cada grupo de CU
FILE.KEGG.HSJD.sel.CU<- FILE.KEGG.HSJD[(FILE.KEGG.HSJD$GROUP=="fecal.Control.control_vs_fecal.CU.3.meses" |
                                       FILE.KEGG.HSJD$GROUP=="fecal.Control.control_vs_fecal.CU.6.meses" | 
                                       FILE.KEGG.HSJD$GROUP=="fecal.Control.control_vs_fecal.CU.debut" |
                                       FILE.KEGG.HSJD$GROUP=="fecal.CU.6.meses_vs_fecal.CU.debut" ) ,]

#recode group names
FILE.KEGG.HSJD.sel.CU$GROUP.rec<- case_match(FILE.KEGG.HSJD.sel.CU$GROUP, "fecal.Control.control_vs_fecal.CU.3.meses" ~ "F.C vs F.CU.3M",                       
           "fecal.Control.control_vs_fecal.CU.6.meses" ~ "F.C vs F.CU.6M",
           "fecal.Control.control_vs_fecal.CU.debut"~ "F.C vs F.CU.DE",
           "fecal.CU.6.meses_vs_fecal.CU.debut"~ "F.CU.6M vs F.CU.DE",
           .default = NA)

first_ancestor_count <- FILE.KEGG.HSJD.sel.CU %>%
  group_by(GROUP.rec, ONT_DESCRIPTION) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.KEGG.HSJD.sel.CU <- merge(FILE.KEGG.HSJD.sel.CU, first_ancestor_count, by = c("ONT_DESCRIPTION","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.KEGG.HSJD.sel.CU %>%
  distinct(GROUP.rec,ONT_DESCRIPTION, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.KEGG.HSJD.sel.CU <- left_join(FILE.KEGG.HSJD.sel.CU, unique_combinations, by = "GROUP.rec")

FILE.KEGG.HSJD.sel.CU <- FILE.KEGG.HSJD.sel.CU %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.KEGG.HSJD.sel.CU %>%
  group_by(GROUP.rec,ONT_DESCRIPTION) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))
FILE.KEGG.HSJD.sel.CU <- left_join(FILE.KEGG.HSJD.sel.CU, mean_EA, by = c("GROUP.rec", "ONT_DESCRIPTION"))


ggballoonplot(FILE.KEGG.HSJD.sel.CU, x = "KEGG_up_DOWN", y = "ONT_DESCRIPTION", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "KEGG ontologies",
                                                                                    x= "UP/DOWN",
                                                                                    title="Goup: CU") +
# Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 5, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

Descriptiva de los grupos de EC

#PARA CADA GRUPO EC
FILE.KEGG.HSJD.sel.EC<- FILE.KEGG.HSJD[(FILE.KEGG.HSJD$GROUP=="fecal.EC.3.meses_vs_fecal.Control.control" |
                                       FILE.KEGG.HSJD$GROUP=="fecal.EC.3.meses_vs_fecal.EC.6.meses" |
                                       FILE.KEGG.HSJD$GROUP=="fecal.EC.3.meses_vs_fecal.EC.debut" |
                                       FILE.KEGG.HSJD$GROUP=="fecal.EC.debut_vs_fecal.EC.6.meses") ,]

#recode group names
FILE.KEGG.HSJD.sel.EC$GROUP.rec<- case_match(FILE.KEGG.HSJD.sel.EC$GROUP, "fecal.EC.3.meses_vs_fecal.Control.control" ~ "F.EC.3M vs F.C",                           
           "fecal.EC.3.meses_vs_fecal.EC.6.meses" ~ "F.EC.3M vs F.EC.6M",
           "fecal.EC.3.meses_vs_fecal.EC.debut"~ "F.EC.3M vs F.EC.DE",
           "fecal.EC.debut_vs_fecal.EC.6.meses"~ "F.EC.DE vs F.EC.6M",
           .default = NA)

first_ancestor_count <- FILE.KEGG.HSJD.sel.EC %>%
  group_by(GROUP.rec, ONT_DESCRIPTION) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.KEGG.HSJD.sel.EC <- merge(FILE.KEGG.HSJD.sel.EC, first_ancestor_count, by = c("ONT_DESCRIPTION","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.KEGG.HSJD.sel.EC %>%
  distinct(GROUP.rec,ONT_DESCRIPTION, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.KEGG.HSJD.sel.EC <- left_join(FILE.KEGG.HSJD.sel.EC, unique_combinations, by = "GROUP.rec")

FILE.KEGG.HSJD.sel.EC <- FILE.KEGG.HSJD.sel.EC %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.KEGG.HSJD.sel.EC %>%
  group_by(GROUP.rec,ONT_DESCRIPTION) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))
FILE.KEGG.HSJD.sel.EC <- left_join(FILE.KEGG.HSJD.sel.EC, mean_EA, by = c("GROUP.rec", "ONT_DESCRIPTION"))

ggballoonplot(FILE.KEGG.HSJD.sel.EC, x = "KEGG_up_DOWN", y = "ONT_DESCRIPTION", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "KEGG ontologies",
                                                                                   x= "UP/DOWN",
                                                                                   title="Goup: EC") +
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 5, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#PARA CADA GRUPO EC
FILE.KEGG.HSJD.sel.EC<- FILE.KEGG.HSJD[(FILE.KEGG.HSJD$GROUP=="fecal.EC.3.meses_vs_fecal.Control.control" |
                                       FILE.KEGG.HSJD$GROUP=="fecal.EC.3.meses_vs_fecal.EC.6.meses" |
                                       FILE.KEGG.HSJD$GROUP=="fecal.EC.3.meses_vs_fecal.EC.debut" |
                                       FILE.KEGG.HSJD$GROUP=="fecal.EC.debut_vs_fecal.EC.6.meses") ,]

#recode group names
FILE.KEGG.HSJD.sel.EC$GROUP.rec<- case_match(FILE.KEGG.HSJD.sel.EC$GROUP, "fecal.EC.3.meses_vs_fecal.Control.control" ~ "F.EC.3M vs F.C",                           
           "fecal.EC.3.meses_vs_fecal.EC.6.meses" ~ "F.EC.3M vs F.EC.6M",
           "fecal.EC.3.meses_vs_fecal.EC.debut"~ "F.EC.3M vs F.EC.DE",
           "fecal.EC.debut_vs_fecal.EC.6.meses"~ "F.EC.DE vs F.EC.6M",
           .default = NA)

first_ancestor_count <- FILE.KEGG.HSJD.sel.EC %>%
  group_by(GROUP.rec, interaction_name) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.KEGG.HSJD.sel.EC <- merge(FILE.KEGG.HSJD.sel.EC, first_ancestor_count, by = c("interaction_name","GROUP.rec"), all.x = TRUE)


unique_combinations <- FILE.KEGG.HSJD.sel.EC %>%
  distinct(GROUP.rec,interaction_name, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.KEGG.HSJD.sel.EC <- left_join(FILE.KEGG.HSJD.sel.EC, unique_combinations, by = "GROUP.rec")

FILE.KEGG.HSJD.sel.EC <- FILE.KEGG.HSJD.sel.EC %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.KEGG.HSJD.sel.EC %>%
  group_by(GROUP.rec,interaction_name) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))
FILE.KEGG.HSJD.sel.EC <- left_join(FILE.KEGG.HSJD.sel.EC, mean_EA, by = c("GROUP.rec", "interaction_name"))

ggballoonplot(FILE.KEGG.HSJD.sel.EC, x = "KEGG_up_DOWN", y = "interaction_name", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "KEGG ontologies",
                                                                                   x= "UP/DOWN",
                                                                                   title="Goup: EC") +
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 5, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

FILE.KEGG.HSJD.sel.CU<- FILE.KEGG.HSJD[(FILE.KEGG.HSJD$GROUP=="fecal.Control.control_vs_fecal.CU.3.meses" |
                                       FILE.KEGG.HSJD$GROUP=="fecal.Control.control_vs_fecal.CU.6.meses" | 
                                       FILE.KEGG.HSJD$GROUP=="fecal.Control.control_vs_fecal.CU.debut" |
                                       FILE.KEGG.HSJD$GROUP=="fecal.CU.6.meses_vs_fecal.CU.debut" ) ,]

#recode group names
FILE.KEGG.HSJD.sel.CU$GROUP.rec<- case_match(FILE.KEGG.HSJD.sel.CU$GROUP, "fecal.Control.control_vs_fecal.CU.3.meses" ~ "F.C vs F.CU.3M",                       
           "fecal.Control.control_vs_fecal.CU.6.meses" ~ "F.C vs F.CU.6M",
           "fecal.Control.control_vs_fecal.CU.debut"~ "F.C vs F.CU.DE",
           "fecal.CU.6.meses_vs_fecal.CU.debut"~ "F.CU.6M vs F.CU.DE",
           .default = NA)

first_ancestor_count <- FILE.KEGG.HSJD.sel.CU %>%
  group_by(GROUP.rec, interaction_name) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.KEGG.HSJD.sel.CU <- merge(FILE.KEGG.HSJD.sel.CU, first_ancestor_count, by = c("interaction_name","GROUP.rec"), all.x = TRUE)


unique_combinations <- FILE.KEGG.HSJD.sel.CU %>%
  distinct(GROUP.rec,interaction_name, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.KEGG.HSJD.sel.CU <- left_join(FILE.KEGG.HSJD.sel.CU, unique_combinations, by = "GROUP.rec")

FILE.KEGG.HSJD.sel.CU <- FILE.KEGG.HSJD.sel.CU %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.KEGG.HSJD.sel.CU %>%
  group_by(GROUP.rec,interaction_name) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))
FILE.KEGG.HSJD.sel.CU <- left_join(FILE.KEGG.HSJD.sel.CU, mean_EA, by = c("GROUP.rec", "interaction_name"))

ggballoonplot(FILE.KEGG.HSJD.sel.CU, x = "KEGG_up_DOWN", y = "interaction_name", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "KEGG ontologies",
                                                                                   x= "UP/DOWN",
                                                                                   title="Goup: EC") +
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 5, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#PARA CADA GRUPO EC
FILE.KEGG.HSJD.sel.EC<- FILE.KEGG.HSJD[(FILE.KEGG.HSJD$GROUP=="fecal.EC.3.meses_vs_fecal.Control.control" |
                                       FILE.KEGG.HSJD$GROUP=="fecal.EC.3.meses_vs_fecal.EC.6.meses" |
                                       FILE.KEGG.HSJD$GROUP=="fecal.EC.3.meses_vs_fecal.EC.debut" |
                                       FILE.KEGG.HSJD$GROUP=="fecal.EC.debut_vs_fecal.EC.6.meses") ,]

#recode group names
FILE.KEGG.HSJD.sel.EC$GROUP.rec<- case_match(FILE.KEGG.HSJD.sel.EC$GROUP, "fecal.EC.3.meses_vs_fecal.Control.control" ~ "F.EC.3M vs F.C",                           
           "fecal.EC.3.meses_vs_fecal.EC.6.meses" ~ "F.EC.3M vs F.EC.6M",
           "fecal.EC.3.meses_vs_fecal.EC.debut"~ "F.EC.3M vs F.EC.DE",
           "fecal.EC.debut_vs_fecal.EC.6.meses"~ "F.EC.DE vs F.EC.6M",
           .default = NA)

first_ancestor_count <- FILE.KEGG.HSJD.sel.EC %>%
  group_by(GROUP.rec, reaction_name) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.KEGG.HSJD.sel.EC <- merge(FILE.KEGG.HSJD.sel.EC, first_ancestor_count, by = c("reaction_name","GROUP.rec"), all.x = TRUE)


unique_combinations <- FILE.KEGG.HSJD.sel.EC %>%
  distinct(GROUP.rec,reaction_name, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.KEGG.HSJD.sel.EC <- left_join(FILE.KEGG.HSJD.sel.EC, unique_combinations, by = "GROUP.rec")

FILE.KEGG.HSJD.sel.EC <- FILE.KEGG.HSJD.sel.EC %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.KEGG.HSJD.sel.EC %>%
  group_by(GROUP.rec,reaction_name) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))
FILE.KEGG.HSJD.sel.EC <- left_join(FILE.KEGG.HSJD.sel.EC, mean_EA, by = c("GROUP.rec", "reaction_name"))

ggballoonplot(FILE.KEGG.HSJD.sel.EC, x = "KEGG_up_DOWN", y = "reaction_name", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "KEGG ontologies",
                                                                                   x= "UP/DOWN",
                                                                                   title="Goup: EC") +
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 5, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

FILE.KEGG.HSJD.sel.CU<- FILE.KEGG.HSJD[(FILE.KEGG.HSJD$GROUP=="fecal.Control.control_vs_fecal.CU.3.meses" |
                                       FILE.KEGG.HSJD$GROUP=="fecal.Control.control_vs_fecal.CU.6.meses" | 
                                       FILE.KEGG.HSJD$GROUP=="fecal.Control.control_vs_fecal.CU.debut" |
                                       FILE.KEGG.HSJD$GROUP=="fecal.CU.6.meses_vs_fecal.CU.debut" ) ,]

#recode group names
FILE.KEGG.HSJD.sel.CU$GROUP.rec<- case_match(FILE.KEGG.HSJD.sel.CU$GROUP, "fecal.Control.control_vs_fecal.CU.3.meses" ~ "F.C vs F.CU.3M",                       
           "fecal.Control.control_vs_fecal.CU.6.meses" ~ "F.C vs F.CU.6M",
           "fecal.Control.control_vs_fecal.CU.debut"~ "F.C vs F.CU.DE",
           "fecal.CU.6.meses_vs_fecal.CU.debut"~ "F.CU.6M vs F.CU.DE",
           .default = NA)

first_ancestor_count <- FILE.KEGG.HSJD.sel.CU %>%
  group_by(GROUP.rec, reaction_name) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.KEGG.HSJD.sel.CU <- merge(FILE.KEGG.HSJD.sel.CU, first_ancestor_count, by = c("reaction_name","GROUP.rec"), all.x = TRUE)


unique_combinations <- FILE.KEGG.HSJD.sel.CU %>%
  distinct(GROUP.rec,reaction_name, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.KEGG.HSJD.sel.CU <- left_join(FILE.KEGG.HSJD.sel.CU, unique_combinations, by = "GROUP.rec")

FILE.KEGG.HSJD.sel.CU <- FILE.KEGG.HSJD.sel.CU %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.KEGG.HSJD.sel.CU %>%
  group_by(GROUP.rec,reaction_name) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))
FILE.KEGG.HSJD.sel.CU <- left_join(FILE.KEGG.HSJD.sel.CU, mean_EA, by = c("GROUP.rec", "reaction_name"))

ggballoonplot(FILE.KEGG.HSJD.sel.CU, x = "KEGG_up_DOWN", y = "reaction_name", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "KEGG ontologies",
                                                                                   x= "UP/DOWN",
                                                                                   title="Goup: EC") +
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 5, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#de momento descartado, analisis de frecuencias parciales
########################################################## ANALISIS DE TABLAS DE CONTINGENCIA CON TABLAS AGRUPADAS DE ALBERT MARTIN DONDE SE PRESENTAN LAS SUMAS DE LAS ONTOLOGIES (KEGG = UP + DOWN)

#CU_entries ############################################################

#EJEMPLO CON CU TODOS LOS GRUPOS SIGNIFICATIVOS A UN NIVEL P<0.001 PARA FDR, DONDE SE AGRUPAN ONTOLOGIES DE UP + DOWN Y DEPUES SE REAGRUPAN POR GRUPOS DE ORDEN SUPERIOR PARA KEGG
#setwd("C:/Users/Antonio Monleon/Dropbox/otros analisis estadisticos 2015/HSJD 2017 javier martin de carpi SEQUENTIA/estudio_HSJD_2019_2020_fis_bioinformatica/2019_2020/analisis_funcional_2022_andreuPay/functional_analysis_albert11_23")
#CU_entries <- read.csv("CU_entries.csv") #ONTOLOGIES DEL kegg para grupo de CU

#convert data frame in contingency table
#tabledata <- t(xtabs(COUNT ~ ., CU_entries))
#chisq.test((tabledata )) #p>0.05 ->0.5823
#fisher.test(tabledata,simulate.p.value=TRUE ) #p>0.05 -> 0.5823


#library(vcdExtra)
#mosaic(tabledata, gp = shading_max, 
#       split_vertical = TRUE, 
#       main="KEGG_CU")


#Mosaic plot for the Arthritis data, using shading_max
#gp = shading_max specifies that color in the plot signals a significant residual at a 90% or 99% significance level, with the more intense shade for 99%. Note that the residuals for the independence model were not large (as shown in the legend), yet the association between Treatment and Improved is highly significant.

#library(ca)
#fit <- ca(tabledata)
#print(fit) # basic results
#summary(fit) # extended results
#plot(fit) # symmetric map
#plot(fit, mass = TRUE, contrib = "absolute", map =
#       "rowgreen", arrows = c(FALSE, TRUE)) # asymmetric map

ANALISIS ESTADISTICO DE LA GO

La Gene Ontology (GO), es una iniciativa fundamental para unificar la representación de los atributos de genes y productos génicos en todas las especies. Funciona como un sistema de clasificación estandarizado, similar a una taxonomía para genes, que permite describir sus funciones dentro de las células y organismos. En concreto, organiza la información sobre genes en tres categorías principales:

  1. Función molecular: Describe la actividad bioquímica específica de un gen, como la capacidad de una enzima para catalizar una reacción química.
  2. Componente celular: Indica la localización subcelular de un gen o su producto génico, como la presencia en la membrana celular o el núcleo.
  3. Proceso biológico: Define el papel de un gen en un proceso biológico mÔs amplio, como la división celular o la respuesta inmunitaria.

Cada término de la GO tiene una identificación única y se relaciona con otros términos mediante relaciones jerÔrquicas (padre-hijo) y no jerÔrquicas (parte-parte), creando una estructura similar a un Ôrbol de conocimiento. Esto permite navegar y comparar fÔcilmente las funciones de genes a través de diferentes especies y estudios.

Se parte de un fichero filtado que contiene las GO (GOEA__ANALISIS_GO2023_sep23_1_OBTENCIONTABLA_DATOS_RDA.r) . Filtro : P-valor < 0.001 (FDR) Fichero filtrado con GO: FILE.GOEA.HSJD.rda

#############################8/11/2023 // 19-9-2023###################
# STATISTICAL ANALYSIS
######################################################################

#UPLOAD UNA VEZ SE HA OBTENIDO EL FICHERO
#setwd("C:/Users/Antonio Monleon/Dropbox/otros analisis estadisticos 2015/HSJD 2017 javier martin de carpi SEQUENTIA/estudio_HSJD_2019_2020_fis_bioinformatica/2019_2020/analisis_funcional_2022_andreuPay")
#setwd("C:/Users/ub-biost-laptop-01/Dropbox/otros analisis estadisticos 2015/HSJD 2017 javier martin de carpi SEQUENTIA/estudio_HSJD_2019_2020_fis_bioinformatica/2019_2020/analisis_funcional_2022_andreuPay")
#setwd("C:/Users/Antonio Monleon/Dropbox/otros analisis estadisticos 2015/HSJD 2017 javier martin de carpi SEQUENTIA/estudio_HSJD_2019_2020_fis_bioinformatica/2019_2020/analisis_funcional_2022_andreuPay")

load("/Users/alex/Downloads/analisis_functional/FILE.GOEA.HSJD.rda") #N=24041 OBSERVACIONES

#table(FILE.GOEA.HSJD$GROUP) #N=#CONTAR LOS GRUPOS EXPERIMENTALES --> 93
#los grupos que salen son:
#fecal.Control.control_vs_fecal.CU.3.meses     fecal.Control.control_vs_fecal.CU.6.meses       fecal.Control.control_vs_fecal.CU.debut     fecal.Control.control_vs_fecal.EC.6.meses 
#302                                           234                                           268                                           279 
#fecal.Control.control_vs_fecal.EC.debut      fecal.Control.control_vs_oral.CU.3.meses      fecal.Control.control_vs_oral.CU.6.meses        fecal.Control.control_vs_oral.CU.debut 
#332                                           307                                           313                                           336 
#fecal.Control.control_vs_oral.EC.3.meses      fecal.Control.control_vs_oral.EC.6.meses        fecal.Control.control_vs_oral.EC.debut            fecal.CU.3.meses_vs_fecal.CU.debut 
#340                                           323                                           338                                           262 
#fecal.CU.3.meses_vs_fecal.EC.6.meses             fecal.CU.3.meses_vs_oral.CU.debut           fecal.CU.3.meses_vs_oral.EC.6.meses             fecal.CU.3.meses_vs_oral.EC.debut 
#259                                           330                                           330                                           322 
#fecal.CU.6.meses_vs_fecal.CU.3.meses            fecal.CU.6.meses_vs_fecal.CU.debut          fecal.CU.6.meses_vs_fecal.EC.6.meses            fecal.CU.6.meses_vs_fecal.EC.debut 
#131                                           255                                           219                                           270 
#fecal.CU.6.meses_vs_oral.CU.debut           fecal.CU.6.meses_vs_oral.EC.3.meses           fecal.CU.6.meses_vs_oral.EC.6.meses             fecal.CU.6.meses_vs_oral.EC.debut 
#301                                           294                                           288                                           294 
#fecal.CU.debut_vs_oral.EC.debut     fecal.EC.3.meses_vs_fecal.Control.control          fecal.EC.3.meses_vs_fecal.CU.3.meses          fecal.EC.3.meses_vs_fecal.CU.6.meses 
#298                                           292                                           300                                           254 
#fecal.EC.3.meses_vs_fecal.CU.debut          fecal.EC.3.meses_vs_fecal.EC.6.meses            fecal.EC.3.meses_vs_fecal.EC.debut           fecal.EC.3.meses_vs_oral.CU.3.meses 
#250                                           256                                           305                                           298 
#fecal.EC.3.meses_vs_oral.CU.6.meses             fecal.EC.3.meses_vs_oral.CU.debut           fecal.EC.3.meses_vs_oral.EC.3.meses           fecal.EC.3.meses_vs_oral.EC.6.meses 
#289                                           314                                           325                                           306 
#fecal.EC.3.meses_vs_oral.EC.debut            fecal.EC.6.meses_vs_fecal.CU.debut             fecal.EC.6.meses_vs_oral.CU.debut           fecal.EC.6.meses_vs_oral.EC.6.meses 
#336                                           271                                           321                                           309 
#fecal.EC.6.meses_vs_oral.EC.debut            fecal.EC.debut_vs_fecal.CU.3.meses              fecal.EC.debut_vs_fecal.CU.debut            fecal.EC.debut_vs_fecal.EC.6.meses 
#342                                           321                                           242                                           285 
#fecal.EC.debut_vs_oral.CU.debut             fecal.EC.debut_vs_oral.EC.3.meses             fecal.EC.debut_vs_oral.EC.6.meses               fecal.EC.debut_vs_oral.EC.debut 
#329                                           339                                           308                                           337 
#oral.Control.control_vs_fecal.Control.control      oral.Control.control_vs_fecal.CU.3.meses      oral.Control.control_vs_fecal.CU.6.meses        oral.Control.control_vs_fecal.CU.debut 
#363                                           352                                           325                                           309 
#oral.Control.control_vs_fecal.EC.3.meses      oral.Control.control_vs_fecal.EC.6.meses        oral.Control.control_vs_fecal.EC.debut       oral.Control.control_vs_oral.CU.3.meses 
#346                                           343                                           344                                           146 
#oral.Control.control_vs_oral.CU.6.meses         oral.Control.control_vs_oral.CU.debut       oral.Control.control_vs_oral.EC.3.meses       oral.Control.control_vs_oral.EC.6.meses 
#157                                           162                                           122                                           126 
#oral.Control.control_vs_oral.EC.debut           oral.CU.3.meses_vs_fecal.CU.3.meses           oral.CU.3.meses_vs_fecal.CU.6.meses             oral.CU.3.meses_vs_fecal.CU.debut 
#260                                           294                                           268                                           270 
#oral.CU.3.meses_vs_fecal.EC.6.meses             oral.CU.3.meses_vs_fecal.EC.debut            oral.CU.3.meses_vs_oral.CU.6.meses              oral.CU.3.meses_vs_oral.CU.debut 
#292                                           297                                            98                                            97 
#oral.CU.3.meses_vs_oral.EC.3.meses            oral.CU.3.meses_vs_oral.EC.6.meses              oral.CU.3.meses_vs_oral.EC.debut           oral.CU.6.meses_vs_fecal.CU.3.meses 
#120                                           108                                           186                                           304 
#oral.CU.6.meses_vs_fecal.CU.6.meses             oral.CU.6.meses_vs_fecal.CU.debut           oral.CU.6.meses_vs_fecal.EC.6.meses             oral.CU.6.meses_vs_fecal.EC.debut 
#293                                           252                                           290                                           275 
#oral.CU.6.meses_vs_oral.CU.debut            oral.CU.6.meses_vs_oral.EC.3.meses            oral.CU.6.meses_vs_oral.EC.6.meses              oral.CU.6.meses_vs_oral.EC.debut 
#134                                           131                                            87                                           231 
#oral.CU.debut_vs_fecal.CU.debut              oral.CU.debut_vs_oral.EC.6.meses                oral.CU.debut_vs_oral.EC.debut           oral.EC.3.meses_vs_fecal.CU.3.meses 
#288                                           121                                           214                                           320 
#oral.EC.3.meses_vs_fecal.CU.debut           oral.EC.3.meses_vs_fecal.EC.6.meses              oral.EC.3.meses_vs_oral.CU.debut            oral.EC.3.meses_vs_oral.EC.6.meses 
#278                                           325                                           150                                           135 
#oral.EC.3.meses_vs_oral.EC.debut             oral.EC.6.meses_vs_fecal.CU.debut              oral.EC.6.meses_vs_oral.EC.debut 
#180                                           295                                           199 

#frequency(FILE.GOEA.HSJD$GROUP)

#GRUPOS EXPERIMENTALES VALIDOS SON:
 

FILE.GOEA.HSJD.sel<- FILE.GOEA.HSJD[(FILE.GOEA.HSJD$GROUP=="fecal.Control.control_vs_fecal.CU.3.meses" |
                                       FILE.GOEA.HSJD$GROUP=="fecal.Control.control_vs_fecal.CU.6.meses"           |                               
                                       FILE.GOEA.HSJD$GROUP=="fecal.Control.control_vs_fecal.CU.debut"     |
                                       FILE.GOEA.HSJD$GROUP=="fecal.Control.control_vs_fecal.EC.6.meses" |
                                       FILE.GOEA.HSJD$GROUP=="fecal.Control.control_vs_fecal.EC.debut"    |            
                                       FILE.GOEA.HSJD$GROUP=="fecal.CU.3.meses_vs_fecal.CU.debut" |
                                       FILE.GOEA.HSJD$GROUP=="fecal.CU.6.meses_vs_fecal.CU.3.meses"  |          
                                       FILE.GOEA.HSJD$GROUP=="fecal.CU.6.meses_vs_fecal.CU.debut" |
                                       FILE.GOEA.HSJD$GROUP=="fecal.EC.3.meses_vs_fecal.Control.control" |
                                       FILE.GOEA.HSJD$GROUP=="fecal.EC.3.meses_vs_fecal.EC.6.meses" |
                                       FILE.GOEA.HSJD$GROUP=="fecal.EC.3.meses_vs_fecal.EC.debut"  |         
                                       FILE.GOEA.HSJD$GROUP=="fecal.EC.debut_vs_fecal.EC.6.meses" |
                                       FILE.GOEA.HSJD$GROUP=="oral.Control.control_vs_oral.CU.3.meses" |
                                       FILE.GOEA.HSJD$GROUP=="oral.Control.control_vs_oral.CU.6.meses"  |       
                                       FILE.GOEA.HSJD$GROUP=="oral.Control.control_vs_oral.CU.debut" |
                                       FILE.GOEA.HSJD$GROUP=="oral.Control.control_vs_oral.EC.3.meses"  |     
                                       FILE.GOEA.HSJD$GROUP=="oral.Control.control_vs_oral.EC.6.meses" |
                                       FILE.GOEA.HSJD$GROUP=="oral.Control.control_vs_oral.EC.debut"  |         
                                       FILE.GOEA.HSJD$GROUP=="oral.CU.3.meses_vs_oral.CU.6.meses" |             
                                       FILE.GOEA.HSJD$GROUP=="oral.CU.3.meses_vs_oral.CU.debut" |
                                       FILE.GOEA.HSJD$GROUP=="oral.CU.6.meses_vs_oral.CU.debut" |                     
                                       FILE.GOEA.HSJD$GROUP=="oral.EC.3.meses_vs_oral.EC.6.meses" |
                                       FILE.GOEA.HSJD$GROUP=="oral.EC.3.meses_vs_oral.EC.debut"  |           
                                       FILE.GOEA.HSJD$GROUP=="oral.EC.6.meses_vs_oral.EC.debut"
                                       ) ,]
#remove NA


FILE.GOEA.HSJD.sel<-FILE.GOEA.HSJD.sel[!is.na(FILE.GOEA.HSJD.sel$ONT_DESCRIPTION),] #no aparecen unos misteriosos NA,
#write.csv(FILE.GOEA.HSJD.sel, "FILE.GOEA.HSJD.sel.csv", row.names=FALSE)

# Initialize "Disease" column in FILE.GOEA.HSJD.sel with NA
FILE.GOEA.HSJD.sel$Disease <- NA

# Define diseases of interest
diseases <- c("CU", "EC")

# Assign disease to "Disease" column if there is a match in the "GROUP" column
for (disease in diseases) {
  FILE.GOEA.HSJD.sel$Disease <- ifelse(grepl(disease, FILE.GOEA.HSJD.sel$GROUP), disease, FILE.GOEA.HSJD.sel$Disease)
}


# Define function to retrieve ancestors of ontologies

ancestors <- function(ontologies, groups) {
  data <- list()
  contador <- 1
  
  for (i in seq_along(ontologies)) {
    ontology <- as.character(ontologies[i])
    group <- as.character(groups[i])
    
    # Construct URL to retrieve ancestors of ontology
    url <- sprintf("https://www.ebi.ac.uk/QuickGO/services/ontology/go/terms/%s/ancestors?relations=is_a%%2Cpart_of%%2Coccurs_in%%2Cregulates",
                   URLencode(ontology, reserved = TRUE))
    
    # Perform HTTP request and parse JSON response
    response <- content(GET(url, accept("application/json")), "parsed")
    
    # Check if the ontology is obsolete
    if (!response$results[[1]]$isObsolete) {
      ancestors <- response$results[[1]]$ancestors
      ancestors <- setdiff(ancestors, ontology)
      # Store data in list
      data[[contador]] <- data.frame(Group = group, Ancestors = toString(ancestors), Ontology = ontology)
      contador <- contador + 1
    }
  }
  
  # Combine dataframes into a single dataframe
  data_df <- do.call(rbind, data)
  
  # Return the dataframe
  return(data_df)
}

# Read the generated CSV file
FILE.GOEA.ANCESTORS <- ancestors(FILE.GOEA.HSJD.sel$ONTOLOGY, FILE.GOEA.HSJD.sel$GROUP)

# Rename columns of the FILE.GOEA.ANCESTORS dataframe
colnames(FILE.GOEA.ANCESTORS) <- c("GROUP", "ANCESTORS", "ONTOLOGY")

# Process "ANCESTORS" columns in FILE.GOEA.ANCESTORS
FILE.GOEA.ANCESTORS$ANCESTORS <- lapply(FILE.GOEA.ANCESTORS$ANCESTORS, function(x) {
  x <- unlist(strsplit(x, ", "))
  x <- x[!x %in% c("GO:0003674", "GO:0005575", "GO:0008150")]
  x <- trimws(x)  
  paste(x, collapse = ", ")
})

# Combine FILE.GOEA.HSJD.sel and FILE.GOEA.ANCESTORS by "ONTOLOGY"
FILE.GOEA.HSJD.sel <- merge(FILE.GOEA.HSJD.sel, FILE.GOEA.ANCESTORS[,2:3], by = "ONTOLOGY", all.x = TRUE)

# Summarize FILE.GOEA.HSJD.sel
FILE.GOEA.HSJD.sel <- FILE.GOEA.HSJD.sel %>%
  group_by(GROUP, ONTOLOGY, GOEA_up_DOWN) %>%
  summarise(
    ONT_DESCRIPTION = unique(ONT_DESCRIPTION),
    Disease = unique(Disease),
    EA_VALUE = mean(EA_VALUE),
    sample = toString(unique(sample)),
    GOEA_up_DOWN = unique(GOEA_up_DOWN),
    ANCESTORS = toString(unique(ANCESTORS))
  ) %>%
  filter(ANCESTORS != "NA" & ANCESTORS != "")

# Define function to retrieve children of a GO term
get_children_quickgo <- function(go_id) {
  # Construct URL to retrieve children of a GO term
  base_url <- "https://www.ebi.ac.uk/QuickGO/services/ontology/go/terms/"
  url <- paste0(base_url, go_id, "/children")
  response <- httr::GET(url)
  if (httr::http_type(response) == "application/json") {
    children <- httr::content(response, "parsed")
    return(children)
  } else {
    stop("Error: The response is not in JSON format.")
  }               
}

# Define function to retrieve information of children of a GO term
get_children_info <- function(go_term) {
  children_quickgo <- get_children_quickgo(go_term)
  children_list <- children_quickgo$results
  children_df <- data.frame(id = character(), name = character(), stringsAsFactors = FALSE)
  for (child_info in children_list[[1]]$children) {
    child_id <- child_info$id
    child_name <- child_info$name
    child_df <- data.frame(id = child_id, name = child_name, stringsAsFactors = FALSE)
    children_df <- rbind(children_df, child_df)
  }
  return(children_df)
}

# Get children information for different ontology types
children_info_mf <- get_children_info("GO:0003674")
children_info_bp <- get_children_info("GO:0008150")
children_info_cc <- get_children_info("GO:0005575")

# Define function to find the first matching ancestor
find_first_matching_ancestor <- function(ancestors) {
  for (ancestor in ancestors) {
    if (ancestor %in% children_info_mf$id) {
      return(ancestor)
    } else if (ancestor %in% children_info_bp$id) {
      return(ancestor)
    } else if (ancestor %in% children_info_cc$id) {
      return(ancestor)
    }
  }
  return(NA)
}

# Apply function to find the first matching ancestor to FILE.GOEA.HSJD.sel
FILE.GOEA.HSJD.sel <- FILE.GOEA.HSJD.sel %>%
  mutate(first_ancestor = sapply(strsplit(ANCESTORS, ", "), find_first_matching_ancestor))

# Define function to get the name of the first ancestor
get_first_ancestor_name <- function(ancestor_id) {
  if (ancestor_id %in% children_info_mf$id) {
    return(children_info_mf$name[children_info_mf$id == ancestor_id])
  } else if (ancestor_id %in% children_info_bp$id) {
    return(children_info_bp$name[children_info_bp$id == ancestor_id])
  } else if (ancestor_id %in% children_info_cc$id) {
    return(children_info_cc$name[children_info_cc$id == ancestor_id])
  } else {
    return(NA)
  }
}

# Apply function to get the name of the first ancestor to FILE.GOEA.HSJD.sel
FILE.GOEA.HSJD.sel <- FILE.GOEA.HSJD.sel %>%
  mutate(first_ancestor_name = sapply(first_ancestor, get_first_ancestor_name))

# Display first few rows of FILE.GOEA.HSJD.sel
head(FILE.GOEA.HSJD.sel)
## # A tibble: 6 Ɨ 10
## # Groups:   GROUP, ONTOLOGY [5]
##   GROUP  ONTOLOGY GOEA_up_DOWN ONT_DESCRIPTION Disease EA_VALUE sample ANCESTORS
##   <chr>  <chr>    <chr>        <chr>           <chr>      <dbl> <chr>  <chr>    
## 1 fecal… GO:0000… DOWN         molecular_func… CU       3.81e-5 FECAL  GO:00036…
## 2 fecal… GO:0000… DOWN         molecular_func… CU       1.73e-7 FECAL  GO:00903…
## 3 fecal… GO:0000… DOWN         molecular_func… CU       0       FECAL  GO:00195…
## 4 fecal… GO:0000… UP           molecular_func… CU       1.45e-9 FECAL  GO:00195…
## 5 fecal… GO:0000… DOWN         molecular_func… CU       3.70e-9 FECAL  GO:00468…
## 6 fecal… GO:0000… DOWN         biological_pro… CU       3.80e-8 FECAL  GO:00903…
## # ℹ 2 more variables: first_ancestor <chr>, first_ancestor_name <chr>
#Filter the file to extract the EC samples
FILE.GOEA.HSJD.sel.EC <- FILE.GOEA.HSJD.sel[(FILE.GOEA.HSJD.sel$Disease=="EC"),]

#Filter the file to separate the FECAL and ORAL EC samples
FILE.GOEA.HSJD.sel.EC.FECAL <- FILE.GOEA.HSJD.sel[(FILE.GOEA.HSJD.sel$Disease=="EC" &
                                                    FILE.GOEA.HSJD.sel$sample=="FECAL"),]
FILE.GOEA.HSJD.sel.EC.ORAL <- FILE.GOEA.HSJD.sel[(FILE.GOEA.HSJD.sel$Disease=="EC" &
                                                    FILE.GOEA.HSJD.sel$sample=="ORAL"),]

#Recode the GROUP column names to an easy representation
FILE.GOEA.HSJD.sel.EC.FECAL$GROUP.rec<- case_match(FILE.GOEA.HSJD.sel.EC.FECAL$GROUP, 
                                                   "fecal.Control.control_vs_fecal.EC.6.meses" ~ "F.C vs F.EC.6M",                           
                                                   "fecal.Control.control_vs_fecal.EC.debut" ~ "F.C vs F.EC.DE",
                                                   "fecal.EC.3.meses_vs_fecal.Control.control"~ "F.EC.3M vs F.C",
                                                   "fecal.EC.3.meses_vs_fecal.EC.6.meses"~ "F.EC.3M vs F.EC.6M",
                                                   "fecal.EC.3.meses_vs_fecal.EC.debut"~ "F.EC.3M vs F.EC.DE",
                                                   "fecal.EC.debut_vs_fecal.EC.6.meses"~ "F.EC.DE vs F.EC.6M",
                                                   .default = NA)

FILE.GOEA.HSJD.sel.EC.ORAL$GROUP.rec<- case_match(FILE.GOEA.HSJD.sel.EC.ORAL$GROUP, 
                                                   "oral.Control.control_vs_oral.EC.3.meses" ~ "O.C vs O.EC.3M",                           
                                                   "oral.Control.control_vs_oral.EC.6.meses" ~ "O.C vs O.EC.6M",
                                                   "oral.Control.control_vs_oral.EC.debut"~ "O.C vs O.EC.DE",
                                                   "oral.EC.3.meses_vs_oral.EC.6.meses"~ "O.EC.3M vs O.EC.6M",
                                                   "oral.EC.3.meses_vs_oral.EC.debut"~ "O.EC.3M vs O.EC.DE",
                                                   "oral.EC.6.meses_vs_oral.EC.debut"~ "O.EC.6M vs O.EC.DE",
                                                   .default = NA)

#Separate the ontologies in the 3 biggest groups that you can find in the ont_description column for both files
FILE.GOEA.HSJD.sel.EC.FECAL.molecular_function<- FILE.GOEA.HSJD.sel.EC.FECAL[(FILE.GOEA.HSJD.sel.EC.FECAL$ONT_DESCRIPTION=="molecular_function" & FILE.GOEA.HSJD.sel.EC.FECAL$EA_VALUE>0 ) ,]
FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function<- FILE.GOEA.HSJD.sel.EC.FECAL[(FILE.GOEA.HSJD.sel.EC.FECAL$ONT_DESCRIPTION=="cellular_component" & FILE.GOEA.HSJD.sel.EC.FECAL$EA_VALUE>0) ,]
FILE.GOEA.HSJD.sel.EC.FECAL.biological_process<- FILE.GOEA.HSJD.sel.EC.FECAL[(FILE.GOEA.HSJD.sel.EC.FECAL$ONT_DESCRIPTION=="biological_process" & FILE.GOEA.HSJD.sel.EC.FECAL$EA_VALUE>0) ,]

FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function<- FILE.GOEA.HSJD.sel.EC.ORAL[(FILE.GOEA.HSJD.sel.EC.ORAL$ONT_DESCRIPTION=="molecular_function" & FILE.GOEA.HSJD.sel.EC.ORAL$EA_VALUE>0 ) ,]
FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function<- FILE.GOEA.HSJD.sel.EC.ORAL[(FILE.GOEA.HSJD.sel.EC.ORAL$ONT_DESCRIPTION=="cellular_component" & FILE.GOEA.HSJD.sel.EC.ORAL$EA_VALUE>0) ,]
FILE.GOEA.HSJD.sel.EC.ORAL.biological_process<- FILE.GOEA.HSJD.sel.EC.ORAL[(FILE.GOEA.HSJD.sel.EC.ORAL$ONT_DESCRIPTION=="biological_process" & FILE.GOEA.HSJD.sel.EC.ORAL$EA_VALUE>0) ,]
#Filter the file to extract the CU samples
FILE.GOEA.HSJD.sel.CU <- FILE.GOEA.HSJD.sel[(FILE.GOEA.HSJD.sel$Disease=="CU"),]

#Filter the file to separate the FECAL and ORAL CU samples
FILE.GOEA.HSJD.sel.CU.ORAL <- FILE.GOEA.HSJD.sel[(FILE.GOEA.HSJD.sel$Disease=="CU" &
                                                    FILE.GOEA.HSJD.sel$sample=="ORAL"),]
FILE.GOEA.HSJD.sel.CU.FECAL <- FILE.GOEA.HSJD.sel[(FILE.GOEA.HSJD.sel$Disease=="CU" &
                                                    FILE.GOEA.HSJD.sel$sample=="FECAL"),]

#Recode the GROUP column names to an easy representation
FILE.GOEA.HSJD.sel.CU.FECAL$GROUP.rec<- case_match(FILE.GOEA.HSJD.sel.CU.FECAL$GROUP, 
           "fecal.Control.control_vs_fecal.CU.3.meses" ~ "F.C vs F.CU.3M",                           
           "fecal.Control.control_vs_fecal.CU.6.meses" ~ "F.C vs F.CU.6M",
           "fecal.Control.control_vs_fecal.CU.debut"~ "F.C vs F.CU.DE",
           "fecal.CU.3.meses_vs_fecal.CU.debut"~ "F.CU.3M vs F.CU.DE",
           "fecal.CU.6.meses_vs_fecal.CU.3.meses"~ "F.CU.6M vs F.CU.3M",
           "fecal.CU.6.meses_vs_fecal.CU.debut"~ "F.CU.6M vs F.CU.DE",
           .default = NA)

FILE.GOEA.HSJD.sel.CU.ORAL$GROUP.rec<- case_match(FILE.GOEA.HSJD.sel.CU.ORAL$GROUP, 
           "oral.Control.control_vs_oral.CU.3.meses" ~ "O.C vs O.CU.3M",                           
           "oral.Control.control_vs_oral.CU.6.meses" ~ "O.C vs O.CU.6M",
           "oral.Control.control_vs_oral.CU.debut"~ "O.C vs O.CU.DE",
           "oral.CU.3.meses_vs_oral.CU.debut"~ "O.CU.3M vs O.CU.DE",
           "oral.CU.3.meses_vs_oral.CU.6.meses"~ "O.CU.3M vs O.CU.6M",
           "oral.CU.6.meses_vs_oral.CU.debut"~ "O.CU.6M vs O.CU.DE",
           .default = NA)

#Separate the ontologies in the 3 biggest groups that you can find in the ont_description column for both files
FILE.GOEA.HSJD.sel.CU.FECAL.molecular_function<- FILE.GOEA.HSJD.sel.CU.FECAL[(FILE.GOEA.HSJD.sel.CU.FECAL$ONT_DESCRIPTION=="molecular_function" & FILE.GOEA.HSJD.sel.CU.FECAL$EA_VALUE>0 ) ,]
FILE.GOEA.HSJD.sel.CU.FECAL.cellular_function<- FILE.GOEA.HSJD.sel.CU.FECAL[(FILE.GOEA.HSJD.sel.CU.FECAL$ONT_DESCRIPTION=="cellular_component" & FILE.GOEA.HSJD.sel.CU.FECAL$EA_VALUE>0) ,]
FILE.GOEA.HSJD.sel.CU.FECAL.biological_process<- FILE.GOEA.HSJD.sel.CU.FECAL[(FILE.GOEA.HSJD.sel.CU.FECAL$ONT_DESCRIPTION=="biological_process" & FILE.GOEA.HSJD.sel.CU.FECAL$EA_VALUE>0) ,]

FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function<- FILE.GOEA.HSJD.sel.CU.ORAL[(FILE.GOEA.HSJD.sel.CU.ORAL$ONT_DESCRIPTION=="molecular_function" & FILE.GOEA.HSJD.sel.CU.ORAL$EA_VALUE>0 ) ,]
FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function<- FILE.GOEA.HSJD.sel.CU.ORAL[(FILE.GOEA.HSJD.sel.CU.ORAL$ONT_DESCRIPTION=="cellular_component" & FILE.GOEA.HSJD.sel.CU.ORAL$EA_VALUE>0) ,]
FILE.GOEA.HSJD.sel.CU.ORAL.biological_process<- FILE.GOEA.HSJD.sel.CU.ORAL[(FILE.GOEA.HSJD.sel.CU.ORAL$ONT_DESCRIPTION=="biological_process" & FILE.GOEA.HSJD.sel.CU.ORAL$EA_VALUE>0) ,]

RESULTADOS GENERALES DE GO

ggplotGrid(ncol = 2,
           lapply(c("ONT_DESCRIPTION", "sample",          "GROUP",          "GOEA_up_DOWN"),
                  function(col) {
                    ggplot(FILE.GOEA.HSJD.sel, aes_string(col)) + geom_bar() + coord_flip()
                  }))

ggplotGrid(ncol = 2,
           lapply(c("ONT_DESCRIPTION"),
                  function(col) {
                    ggplot(FILE.GOEA.HSJD.sel, aes_string(col)) + geom_bar() + coord_flip()
                  }))

Descripción GO para cada grup CU / FECAL

######################################################
#para cada grupo de CU / FECAL
######################################################
first_ancestor_count <- FILE.GOEA.HSJD.sel.CU.FECAL %>%
  group_by(GROUP.rec, ONT_DESCRIPTION) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.GOEA.HSJD.sel.CU.FECAL <- merge(FILE.GOEA.HSJD.sel.CU.FECAL, first_ancestor_count, by = c("ONT_DESCRIPTION","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.GOEA.HSJD.sel.CU.FECAL %>%
  distinct(GROUP.rec,ONT_DESCRIPTION, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.GOEA.HSJD.sel.CU.FECAL <- left_join(FILE.GOEA.HSJD.sel.CU.FECAL, unique_combinations, by = "GROUP.rec")

FILE.GOEA.HSJD.sel.CU.FECAL <- FILE.GOEA.HSJD.sel.CU.FECAL %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.GOEA.HSJD.sel.CU.FECAL %>%
  group_by(GROUP.rec,ONT_DESCRIPTION) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))



# Merge con el dataset original
FILE.GOEA.HSJD.sel.CU.FECAL <- left_join(FILE.GOEA.HSJD.sel.CU.FECAL, mean_EA, by = c("GROUP.rec", "ONT_DESCRIPTION"))

ggballoonplot((FILE.GOEA.HSJD.sel.CU.FECAL), x = "GOEA_up_DOWN", y = "ONT_DESCRIPTION", size =  "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - CU- MUESTRAS FECALES")+
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 5, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

Descripción para CU / ORAL

######################################################
#para cada grupo de CU / ORAL
######################################################

first_ancestor_count <- FILE.GOEA.HSJD.sel.CU.ORAL %>%
  group_by(GROUP.rec, ONT_DESCRIPTION) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.GOEA.HSJD.sel.CU.ORAL <- merge(FILE.GOEA.HSJD.sel.CU.ORAL, first_ancestor_count, by = c("ONT_DESCRIPTION","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.GOEA.HSJD.sel.CU.ORAL %>%
  distinct(GROUP.rec,ONT_DESCRIPTION, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))



FILE.GOEA.HSJD.sel.CU.ORAL <- left_join(FILE.GOEA.HSJD.sel.CU.ORAL, unique_combinations, by = "GROUP.rec")

FILE.GOEA.HSJD.sel.CU.ORAL <- FILE.GOEA.HSJD.sel.CU.ORAL %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.GOEA.HSJD.sel.CU.ORAL %>%
  group_by(GROUP.rec,ONT_DESCRIPTION) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))



# Merge con el dataset original
FILE.GOEA.HSJD.sel.CU.ORAL <- left_join(FILE.GOEA.HSJD.sel.CU.ORAL, mean_EA, by = c("GROUP.rec", "ONT_DESCRIPTION"))

ggballoonplot((FILE.GOEA.HSJD.sel.CU.ORAL), x = "GOEA_up_DOWN", y = "ONT_DESCRIPTION", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - CU- MUESTRAS ORALES")+
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 5, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

Descripción para cada muestra de EC - FECAL

#EC:

######################################################
#para cada grupo de EC / FECAL
######################################################

first_ancestor_count <- FILE.GOEA.HSJD.sel.EC.FECAL %>%
  group_by(GROUP.rec, ONT_DESCRIPTION) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.GOEA.HSJD.sel.EC.FECAL <- merge(FILE.GOEA.HSJD.sel.EC.FECAL, first_ancestor_count, by = c("ONT_DESCRIPTION","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.GOEA.HSJD.sel.EC.FECAL %>%
  distinct(GROUP.rec,ONT_DESCRIPTION, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.GOEA.HSJD.sel.EC.FECAL <- left_join(FILE.GOEA.HSJD.sel.EC.FECAL, unique_combinations, by = "GROUP.rec")

FILE.GOEA.HSJD.sel.EC.FECAL <- FILE.GOEA.HSJD.sel.EC.FECAL %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.GOEA.HSJD.sel.EC.FECAL %>%
  group_by(GROUP.rec,ONT_DESCRIPTION) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))



# Merge con el dataset original
FILE.GOEA.HSJD.sel.EC.FECAL <- left_join(FILE.GOEA.HSJD.sel.EC.FECAL, mean_EA, by = c("GROUP.rec", "ONT_DESCRIPTION"))

ggballoonplot((FILE.GOEA.HSJD.sel.EC.FECAL), x = "GOEA_up_DOWN", y = "ONT_DESCRIPTION", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - EC- MUESTRAS FECALES")+
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 5, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

Descripción para EC- ORAL

######################################################
#para cada grupo de EC / ORAL
######################################################

first_ancestor_count <- FILE.GOEA.HSJD.sel.EC.ORAL %>%
  group_by(GROUP.rec, ONT_DESCRIPTION) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.GOEA.HSJD.sel.EC.ORAL <- merge(FILE.GOEA.HSJD.sel.EC.ORAL, first_ancestor_count, by = c("ONT_DESCRIPTION","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.GOEA.HSJD.sel.EC.ORAL %>%
  distinct(GROUP.rec,ONT_DESCRIPTION, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.GOEA.HSJD.sel.EC.ORAL <- left_join(FILE.GOEA.HSJD.sel.EC.ORAL, unique_combinations, by = "GROUP.rec")

FILE.GOEA.HSJD.sel.EC.ORAL <- FILE.GOEA.HSJD.sel.EC.ORAL %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.GOEA.HSJD.sel.EC.ORAL %>%
  group_by(GROUP.rec,ONT_DESCRIPTION) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))



# Merge con el dataset original
FILE.GOEA.HSJD.sel.EC.ORAL <- left_join(FILE.GOEA.HSJD.sel.EC.ORAL, mean_EA, by = c("GROUP.rec", "ONT_DESCRIPTION"))

ggballoonplot((FILE.GOEA.HSJD.sel.EC.ORAL), x = "GOEA_up_DOWN", y = "ONT_DESCRIPTION", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - EC- MUESTRAS ORALES")+
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 5, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

ANALISIS ESTADISTICO POR CLASES DE LA GO :

MOLECULAR, BIOLOGICAL I CELULAR

ANALISIS GO grupo CU / FECAL

######################################################
#analisis por clases de GO ---------------------------------------------------------------------------------------------------------------------

#MOLECULAR, BIOLOGICAL I CELULAR
######################################################

#recode group names

ggplotGrid(ncol = 2,
           lapply(c("ONT_DESCRIPTION"),
                  function(col) {
                    ggplot(FILE.GOEA.HSJD.sel.CU.FECAL, aes_string(col)) + geom_bar() + coord_flip()
                  }))

ONTOLOGY_count <- FILE.GOEA.HSJD.sel.CU.FECAL.molecular_function %>%
  group_by(GROUP.rec, ONTOLOGY) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.GOEA.HSJD.sel.CU.FECAL.molecular_function <- merge(FILE.GOEA.HSJD.sel.CU.FECAL.molecular_function, ONTOLOGY_count, by = c("ONTOLOGY","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.GOEA.HSJD.sel.CU.FECAL.molecular_function %>%
  distinct(GROUP.rec,ONTOLOGY, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.GOEA.HSJD.sel.CU.FECAL.molecular_function <- left_join(FILE.GOEA.HSJD.sel.CU.FECAL.molecular_function, unique_combinations, by = "GROUP.rec")

FILE.GOEA.HSJD.sel.CU.FECAL.molecular_function <- FILE.GOEA.HSJD.sel.CU.FECAL.molecular_function %>%
  mutate(gene_rate = count / total_counts)


mean_EA <- FILE.GOEA.HSJD.sel.CU.FECAL.molecular_function %>%
  group_by(GROUP.rec,ONTOLOGY) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))



# Merge con el dataset original
FILE.GOEA.HSJD.sel.CU.FECAL.molecular_function <- left_join(FILE.GOEA.HSJD.sel.CU.FECAL.molecular_function, mean_EA, by = c("GROUP.rec", "ONTOLOGY"))

#HAY 647
ggballoonplot((FILE.GOEA.HSJD.sel.CU.FECAL.molecular_function), x = "GOEA_up_DOWN", y = "ONTOLOGY", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  theme(axis.text=element_text(size=5))+ scale_fill_viridis_c(option = "C") +  labs( y= "GOEA MF",
                                                                                   x= "UP/DOWN",
                                                                                 title="GOEA ontologies - CU- MUESTRAS FECALES molecular_function")+
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#calculate frequency of position, grouped by team
#library(dplyr)
#aaa<-FILE.GOEA.HSJD.sel.CU.FECAL.molecular_function %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.CU.FECAL.molecular_function.txt", sep = "\t",            row.names = FALSE)

#crear unas tabla de frecuencias con porcentajes para cada ontology, dividido por up y down, ordenado por la frecuencia y quizas por E y FDR
table(FILE.GOEA.HSJD.sel.CU.FECAL.molecular_function$ONTOLOGY)
## 
## GO:0000049 GO:0000150 GO:0000155 GO:0000166 GO:0000287 GO:0002161 GO:0003676 
##          5          5          5          2          6          5          1 
## GO:0003678 GO:0003684 GO:0003746 GO:0003844 GO:0003852 GO:0003871 GO:0003872 
##          6          9          1          6          1          1          2 
## GO:0003887 GO:0003896 GO:0003911 GO:0003917 GO:0003918 GO:0003924 GO:0003951 
##         10         10          1         11          7          7          2 
## GO:0003952 GO:0003959 GO:0003964 GO:0003978 GO:0003979 GO:0003984 GO:0003993 
##          2          1          3          1          3          3          1 
## GO:0004022 GO:0004034 GO:0004044 GO:0004065 GO:0004066 GO:0004088 GO:0004129 
##          2          3          2          1          2          6          3 
## GO:0004134 GO:0004151 GO:0004160 GO:0004176 GO:0004177 GO:0004252 GO:0004326 
##          6          1          1          7          5          2          1 
## GO:0004347 GO:0004355 GO:0004356 GO:0004360 GO:0004373 GO:0004386 GO:0004399 
##          1          2          5          2          1          9          1 
## GO:0004427 GO:0004436 GO:0004467 GO:0004514 GO:0004516 GO:0004523 GO:0004527 
##          1          1          1          1          1          2          6 
## GO:0004553 GO:0004560 GO:0004563 GO:0004565 GO:0004642 GO:0004645 GO:0004650 
##          6          6          8          6          1          1          3 
## GO:0004654 GO:0004658 GO:0004674 GO:0004712 GO:0004743 GO:0004802 GO:0004813 
##          1          1          2          3          1          1          3 
## GO:0004814 GO:0004818 GO:0004822 GO:0004823 GO:0004824 GO:0004825 GO:0004826 
##          3          1          1          1          1          2          1 
## GO:0004832 GO:0004854 GO:0004888 GO:0005315 GO:0005436 GO:0005506 GO:0005524 
##          2          1          2          1          3          1          2 
## GO:0005525 GO:0008061 GO:0008081 GO:0008094 GO:0008137 GO:0008170 GO:0008173 
##          6          1          2          2          3          2          2 
## GO:0008184 GO:0008234 GO:0008236 GO:0008237 GO:0008270 GO:0008276 GO:0008408 
##          6          5          5          2         10          1          9 
## GO:0008422 GO:0008456 GO:0008483 GO:0008484 GO:0008658 GO:0008661 GO:0008705 
##          6          1          4         12          7          5          5 
## GO:0008728 GO:0008734 GO:0008760 GO:0008774 GO:0008795 GO:0008808 GO:0008861 
##          3          1          2          1          1          2          1 
## GO:0008878 GO:0008901 GO:0008927 GO:0008940 GO:0008955 GO:0008976 GO:0008998 
##          2          3          1          1          7          2          3 
## GO:0009002 GO:0009007 GO:0009011 GO:0009024 GO:0009032 GO:0009035 GO:0009044 
##          6          2          1          1          1          3          1 
## GO:0009381 GO:0015093 GO:0015159 GO:0015293 GO:0015297 GO:0015379 GO:0015420 
##          7          6          2          4          4          1          1 
## GO:0015562 GO:0015649 GO:0015655 GO:0015658 GO:0016149 GO:0016655 GO:0016746 
##          7          1          3          1          3          1          3 
## GO:0016757 GO:0016783 GO:0016787 GO:0016798 GO:0016810 GO:0016836 GO:0016874 
##          7          3          6          7          3          8          1 
## GO:0016879 GO:0016887 GO:0016987 GO:0018548 GO:0018826 GO:0019134 GO:0019829 
##          1          5          1          1          1          1          7 
## GO:0022857 GO:0030145 GO:0030170 GO:0030234 GO:0030246 GO:0030341 GO:0030570 
##          3          2          4          1          3          1          1 
## GO:0030596 GO:0030599 GO:0030955 GO:0030976 GO:0030983 GO:0031071 GO:0031419 
##          1          1          1          7          6          2          4 
## GO:0032440 GO:0032549 GO:0033201 GO:0033727 GO:0033765 GO:0033940 GO:0034335 
##          1          1          1          1          1          1          5 
## GO:0035596 GO:0036380 GO:0042910 GO:0042972 GO:0043023 GO:0043139 GO:0043169 
##          2          2          4          1          3          1          7 
## GO:0043546 GO:0043565 GO:0043780 GO:0044318 GO:0046537 GO:0046555 GO:0046556 
##          1          6          1          1          1          1          4 
## GO:0046872 GO:0046961 GO:0047004 GO:0047334 GO:0047474 GO:0047482 GO:0047632 
##          8          6          1          1          1          1          1 
## GO:0047753 GO:0047905 GO:0050242 GO:0050308 GO:0050418 GO:0050660 GO:0051082 
##          1          1          1          1          1          6          3 
## GO:0051287 GO:0051536 GO:0051538 GO:0051539 GO:0051669 GO:0052621 GO:0052692 
##          2          5          1          7          1          1          5 
## GO:0052761 GO:0052794 GO:0052795 GO:0052796 GO:0052856 GO:0052857 GO:0061634 
##          1          1          1          1          1          1          1 
## GO:0070204 GO:0071111 
##          1          1
ONTOLOGY_count <- FILE.GOEA.HSJD.sel.CU.FECAL.cellular_function %>%
  group_by(GROUP.rec, ONTOLOGY) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.GOEA.HSJD.sel.CU.FECAL.cellular_function <- merge(FILE.GOEA.HSJD.sel.CU.FECAL.cellular_function, ONTOLOGY_count, by = c("ONTOLOGY","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.GOEA.HSJD.sel.CU.FECAL.cellular_function %>%
  distinct(GROUP.rec,ONTOLOGY, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.GOEA.HSJD.sel.CU.FECAL.cellular_function <- left_join(FILE.GOEA.HSJD.sel.CU.FECAL.cellular_function, unique_combinations, by = "GROUP.rec")

FILE.GOEA.HSJD.sel.CU.FECAL.cellular_function <- FILE.GOEA.HSJD.sel.CU.FECAL.cellular_function %>%
  mutate(gene_rate = count / total_counts)


mean_EA <- FILE.GOEA.HSJD.sel.CU.FECAL.cellular_function %>%
  group_by(GROUP.rec,ONTOLOGY) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))



# Merge con el dataset original
FILE.GOEA.HSJD.sel.CU.FECAL.cellular_function <- left_join(FILE.GOEA.HSJD.sel.CU.FECAL.cellular_function, mean_EA, by = c("GROUP.rec", "ONTOLOGY"))


ggballoonplot((FILE.GOEA.HSJD.sel.CU.FECAL.cellular_function), x = "GOEA_up_DOWN", y = "ONTOLOGY", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA CF",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - CU- MUESTRAS FECALES cellular_function")+
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#calculate frequency of position, grouped by team
#aaa<-FILE.GOEA.HSJD.sel.CU.FECAL.cellular_function %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.CU.FECAL.cellular_function.txt", sep = "\t",            row.names = FALSE)

ONTOLOGY_count <- FILE.GOEA.HSJD.sel.CU.FECAL.biological_process %>%
  group_by(GROUP.rec, ONTOLOGY) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.GOEA.HSJD.sel.CU.FECAL.biological_process <- merge(FILE.GOEA.HSJD.sel.CU.FECAL.biological_process, ONTOLOGY_count, by = c("ONTOLOGY","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.GOEA.HSJD.sel.CU.FECAL.biological_process %>%
  distinct(GROUP.rec,ONTOLOGY, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.GOEA.HSJD.sel.CU.FECAL.biological_process <- left_join(FILE.GOEA.HSJD.sel.CU.FECAL.biological_process, unique_combinations, by = "GROUP.rec")

FILE.GOEA.HSJD.sel.CU.FECAL.biological_process <- FILE.GOEA.HSJD.sel.CU.FECAL.biological_process %>%
  mutate(gene_rate = count / total_counts)


mean_EA <- FILE.GOEA.HSJD.sel.CU.FECAL.biological_process %>%
  group_by(GROUP.rec,ONTOLOGY) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))



# Merge con el dataset original
FILE.GOEA.HSJD.sel.CU.FECAL.biological_process <- left_join(FILE.GOEA.HSJD.sel.CU.FECAL.biological_process, mean_EA, by = c("GROUP.rec", "ONTOLOGY"))


ggballoonplot((FILE.GOEA.HSJD.sel.CU.FECAL.biological_process), x = "GOEA_up_DOWN", y = "ONTOLOGY", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA BP",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - CU- MUESTRAS FECALES biological_process")+
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#calculate frequency of position, grouped by team
#aaa<-FILE.GOEA.HSJD.sel.CU.FECAL.biological_process %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.CU.FECAL.biological_process.txt", sep = "\t",            row.names = FALSE)

GO para cada grupo de CU / ORAL

#para cada grupo de CU / ORAL

ggplotGrid(ncol = 2,
           lapply(c("ONT_DESCRIPTION"),
                  function(col) {
                    ggplot(FILE.GOEA.HSJD.sel.CU.ORAL, aes_string(col)) + geom_bar() + coord_flip()
                  }))

ggballoonplot((FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function), x = "GOEA_up_DOWN", y = "ONTOLOGY", size = "EA_VALUE",
              fill = "EA_VALUE", facet.by = "GROUP",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - CU- MUESTRAS ORALES molecular_function")+
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#aaa<-FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function.txt", sep = "\t",            row.names = FALSE)



ggballoonplot((FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function), x = "GOEA_up_DOWN", y = "ONTOLOGY", size = "EA_VALUE",
              fill = "EA_VALUE", facet.by = "GROUP",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - CU- MUESTRAS ORALES cellular_function")+
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#aaa<-FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function.txt", sep = "\t",            row.names = FALSE)



ggballoonplot((FILE.GOEA.HSJD.sel.CU.ORAL.biological_process), x = "GOEA_up_DOWN", y = "ONTOLOGY", size = "EA_VALUE",
              fill = "EA_VALUE", facet.by = "GROUP",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - CU- MUESTRAS ORALES biological_process")+
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#aaa<-FILE.GOEA.HSJD.sel.CU.ORAL.biological_process %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.CU.ORAL.biological_process.txt", sep = "\t",            row.names = FALSE)

ANALISIS PARA para cada grupo de EC / FECAL

#EC:

ggplotGrid(ncol = 2,
           lapply(c("ONT_DESCRIPTION"),
                  function(col) {
                    ggplot(FILE.GOEA.HSJD.sel.EC.FECAL, aes_string(col)) + geom_bar() + coord_flip()
                  }))

ggballoonplot((FILE.GOEA.HSJD.sel.EC.FECAL.molecular_function), x = "GOEA_up_DOWN", y = "ONTOLOGY", size = "EA_VALUE",
              fill = "EA_VALUE", facet.by = "GROUP",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - EC- MUESTRAS FECALES molecular_function")+
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#aaa<-FILE.GOEA.HSJD.sel.EC.FECAL.molecular_function %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.EC.FECAL.molecular_function.txt", sep = "\t",            row.names = FALSE)


#table(FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function$ONTOLOGY, FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function$GROUP)

ggballoonplot((FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function), x = "GOEA_up_DOWN", y = "ONTOLOGY", size = "EA_VALUE",
              fill = "EA_VALUE", facet.by = "GROUP",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - EC- MUESTRAS FECALES cellular_function")+
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#aaa<-FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function.txt", sep = "\t",            row.names = FALSE)


ggballoonplot((FILE.GOEA.HSJD.sel.EC.FECAL.biological_process), x = "GOEA_up_DOWN", y = "ONTOLOGY", size = "EA_VALUE",
              fill = "EA_VALUE", facet.by = "GROUP",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - EC- MUESTRAS FECALES biological_process")+
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#aaa<-FILE.GOEA.HSJD.sel.EC.FECAL.biological_process %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.EC.FECAL.biological_process.txt", sep = "\t",            row.names = FALSE)

ANALISIS para cada grupo de EC / ORAL

#para cada grupo de EC / ORAL

ggplotGrid(ncol = 2,
           lapply(c("ONT_DESCRIPTION"),
                  function(col) {
                    ggplot(FILE.GOEA.HSJD.sel.EC.ORAL, aes_string(col)) + geom_bar() + coord_flip()
                  }))+
    # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

## NULL
ggballoonplot((FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function), x = "GOEA_up_DOWN", y = "ONTOLOGY", size = "EA_VALUE",
              fill = "EA_VALUE", facet.by = "GROUP",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - EC- MUESTRAS ORALES molecular_function") +
    # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#aaa<-FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function.txt", sep = "\t",            row.names = FALSE)



ggballoonplot((FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function), x = "GOEA_up_DOWN", y = "ONTOLOGY", size = "EA_VALUE",
              fill = "EA_VALUE", facet.by = "GROUP",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - EC- MUESTRAS ORALES cellular_function") +
    # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#aaa<-FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function.txt", sep = "\t",            row.names = FALSE)


ggballoonplot((FILE.GOEA.HSJD.sel.EC.ORAL.biological_process), x = "GOEA_up_DOWN", y = "ONTOLOGY", size = "EA_VALUE",
              fill = "EA_VALUE", facet.by = "GROUP",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - EC- MUESTRAS ORALES biological_process") +
    # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#aaa<-FILE.GOEA.HSJD.sel.EC.ORAL.biological_process %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.EC.ORAL.biological_process.txt", sep = "\t",            row.names = FALSE)
#para segundo ancestor de EC / ORAL
ggplotGrid(
           lapply(c("first_ancestor_name"),
                  function(col) {
                    ggplot(FILE.GOEA.HSJD.sel.EC.ORAL, aes_string(col)) + geom_bar() + coord_flip()
                  }))+
    # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

## NULL
first_ancestor_count <- FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function with first_ancestor_count
FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function <- merge(FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function, first_ancestor_count, by = c("first_ancestor_name","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function %>%
  distinct(GROUP.rec,first_ancestor_name, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function <- left_join(FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function, unique_combinations, by = "GROUP.rec")

FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function <- FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))

# Merge con el dataset original
FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function <- left_join(FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function, mean_EA, by = c("GROUP.rec", "first_ancestor_name"))


ggballoonplot((FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function), x = "GOEA_up_DOWN", y = "first_ancestor_name", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - EC- MUESTRAS ORALES molecular_function") +
    # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#aaa<-FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function.txt", sep = "\t",            row.names = FALSE)

first_ancestor_count_2 <- FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function <- merge(FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function, first_ancestor_count_2, by = c("first_ancestor_name","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function %>%
  distinct(GROUP.rec,first_ancestor_name, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function <- left_join(FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function, unique_combinations, by = "GROUP.rec")

FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function <- FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))

# Merge con el dataset original
FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function <- left_join(FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function, mean_EA, by = c("GROUP.rec", "first_ancestor_name"))

ggballoonplot((FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function), x = "GOEA_up_DOWN", y = "first_ancestor_name", size = "EA_VALUE",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - EC- MUESTRAS ORALES cellular_function") +
    # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#aaa<-FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function.txt", sep = "\t",            row.names = FALSE)

first_ancestor_count_3 <- FILE.GOEA.HSJD.sel.EC.ORAL.biological_process %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.GOEA.HSJD.sel.EC.ORAL.biological_process <- merge(FILE.GOEA.HSJD.sel.EC.ORAL.biological_process, first_ancestor_count_3, by = c("first_ancestor_name","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.GOEA.HSJD.sel.EC.ORAL.biological_process %>%
  distinct(GROUP.rec,first_ancestor_name, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.GOEA.HSJD.sel.EC.ORAL.biological_process <- left_join(FILE.GOEA.HSJD.sel.EC.ORAL.biological_process, unique_combinations, by = "GROUP.rec")

FILE.GOEA.HSJD.sel.EC.ORAL.biological_process <- FILE.GOEA.HSJD.sel.EC.ORAL.biological_process %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.GOEA.HSJD.sel.EC.ORAL.biological_process %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))

# Merge con el dataset original
FILE.GOEA.HSJD.sel.EC.ORAL.biological_process <- left_join(FILE.GOEA.HSJD.sel.EC.ORAL.biological_process, mean_EA, by = c("GROUP.rec", "first_ancestor_name"))


ggballoonplot((FILE.GOEA.HSJD.sel.EC.ORAL.biological_process), x = "GOEA_up_DOWN", y = "first_ancestor_name", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - EC- MUESTRAS ORALES biological_process") +
    # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

 #aaa<-FILE.GOEA.HSJD.sel.EC.ORAL.biological_process %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.EC.ORAL.biological_process.txt", sep = "\t",            row.names = FALSE)

ANALISIS para cada grupo de EC / FECAL

#para cada grupo de EC / ORAL
ggplotGrid(lapply(c("first_ancestor_name"),
                  function(col) {
                    ggplot(FILE.GOEA.HSJD.sel.EC.FECAL, aes_string(col)) + geom_bar() + coord_flip()
                  }))+
    # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

## NULL
first_ancestor_count <- FILE.GOEA.HSJD.sel.EC.FECAL.molecular_function %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.GOEA.HSJD.sel.EC.FECAL.molecular_function <- merge(FILE.GOEA.HSJD.sel.EC.FECAL.molecular_function, first_ancestor_count, by = c("first_ancestor_name","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.GOEA.HSJD.sel.EC.FECAL.molecular_function %>%
  distinct(GROUP.rec,first_ancestor_name, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.GOEA.HSJD.sel.EC.FECAL.molecular_function <- left_join(FILE.GOEA.HSJD.sel.EC.FECAL.molecular_function, unique_combinations, by = "GROUP.rec")

FILE.GOEA.HSJD.sel.EC.FECAL.molecular_function <- FILE.GOEA.HSJD.sel.EC.FECAL.molecular_function %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.GOEA.HSJD.sel.EC.FECAL.molecular_function %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))

# Merge con el dataset original
FILE.GOEA.HSJD.sel.EC.FECAL.molecular_function <- left_join(FILE.GOEA.HSJD.sel.EC.FECAL.molecular_function, mean_EA, by = c("GROUP.rec", "first_ancestor_name"))


ggballoonplot((FILE.GOEA.HSJD.sel.EC.FECAL.molecular_function), x = "GOEA_up_DOWN", y = "first_ancestor_name", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - EC- MUESTRAS ORALES molecular_function") +
    # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#aaa<-FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.EC.ORAL.molecular_function.txt", sep = "\t",            row.names = FALSE)


first_ancestor_count <- FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function <- merge(FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function, first_ancestor_count, by = c("first_ancestor_name","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function %>%
  distinct(GROUP.rec,first_ancestor_name, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function <- left_join(FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function, unique_combinations, by = "GROUP.rec")

FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function <- FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))



# Merge con el dataset original
FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function <- left_join(FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function, mean_EA, by = c("GROUP.rec", "first_ancestor_name"))

ggballoonplot((FILE.GOEA.HSJD.sel.EC.FECAL.cellular_function), x = "GOEA_up_DOWN", y = "first_ancestor_name", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - EC- MUESTRAS ORALES cellular_function") +
    # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#aaa<-FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function.txt", sep = "\t",            row.names = FALSE)


first_ancestor_count <- FILE.GOEA.HSJD.sel.EC.FECAL.biological_process %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.GOEA.HSJD.sel.EC.FECAL.biological_process <- merge(FILE.GOEA.HSJD.sel.EC.FECAL.biological_process, first_ancestor_count, by = c("first_ancestor_name","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.GOEA.HSJD.sel.EC.FECAL.biological_process %>%
  distinct(GROUP.rec,first_ancestor_name, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.GOEA.HSJD.sel.EC.FECAL.biological_process <- left_join(FILE.GOEA.HSJD.sel.EC.FECAL.biological_process, unique_combinations, by = "GROUP.rec")

FILE.GOEA.HSJD.sel.EC.FECAL.biological_process <- FILE.GOEA.HSJD.sel.EC.FECAL.biological_process %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.GOEA.HSJD.sel.EC.FECAL.biological_process %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))



# Merge con el dataset original
FILE.GOEA.HSJD.sel.EC.FECAL.biological_process <- left_join(FILE.GOEA.HSJD.sel.EC.FECAL.biological_process, mean_EA, by = c("GROUP.rec", "first_ancestor_name"))

ggballoonplot((FILE.GOEA.HSJD.sel.EC.FECAL.biological_process), x = "GOEA_up_DOWN", y = "first_ancestor_name", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - EC- MUESTRAS ORALES biological_process") +
    # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#aaa<-FILE.GOEA.HSJD.sel.EC.ORAL.biological_process %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.EC.ORAL.biological_process.txt", sep = "\t",            row.names = FALSE)

GO para cada grupo de CU / ORAL

#para cada grupo de CU / ORAL

ggplotGrid(lapply(c("first_ancestor_name"),
                  function(col) {
                    ggplot(FILE.GOEA.HSJD.sel.CU.ORAL, aes_string(col)) + geom_bar() + coord_flip()
                  }))

first_ancestor_count <- FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function <- merge(FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function, first_ancestor_count, by = c("first_ancestor_name","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function %>%
  distinct(GROUP.rec,first_ancestor_name, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function <- left_join(FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function, unique_combinations, by = "GROUP.rec")

FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function <- FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))



# Merge con el dataset original
FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function <- left_join(FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function, mean_EA, by = c("GROUP.rec", "first_ancestor_name"))


ggballoonplot((FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function), x = "GOEA_up_DOWN", y = "first_ancestor_name", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - CU- MUESTRAS ORALES molecular_function")+
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#aaa<-FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.CU.ORAL.molecular_function.txt", sep = "\t",            row.names = FALSE)

first_ancestor_count <- FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function <- merge(FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function, first_ancestor_count, by = c("first_ancestor_name","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function %>%
  distinct(GROUP.rec,first_ancestor_name, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function <- left_join(FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function, unique_combinations, by = "GROUP.rec")

FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function <- FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))



# Merge con el dataset original
FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function <- left_join(FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function, mean_EA, by = c("GROUP.rec", "first_ancestor_name"))

ggballoonplot((FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function), x = "GOEA_up_DOWN", y = "first_ancestor_name", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - CU- MUESTRAS ORALES cellular_function")+
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 8, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#aaa<-FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.CU.ORAL.cellular_function.txt", sep = "\t",            row.names = FALSE)


first_ancestor_count <- FILE.GOEA.HSJD.sel.CU.ORAL.biological_process %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(count = n())
# Merge FILE.GOEA.HSJD.sel.EC.ORAL.cellular_function with first_ancestor_count_2
FILE.GOEA.HSJD.sel.CU.ORAL.biological_process <- merge(FILE.GOEA.HSJD.sel.CU.ORAL.biological_process, first_ancestor_count, by = c("first_ancestor_name","GROUP.rec"), all.x = TRUE)

unique_combinations <- FILE.GOEA.HSJD.sel.CU.ORAL.biological_process %>%
  distinct(GROUP.rec,first_ancestor_name, .keep_all = TRUE) %>%
  group_by(GROUP.rec) %>%
  summarise(total_counts = sum(count))

FILE.GOEA.HSJD.sel.CU.ORAL.biological_process <- left_join(FILE.GOEA.HSJD.sel.CU.ORAL.biological_process, unique_combinations, by = "GROUP.rec")

FILE.GOEA.HSJD.sel.CU.ORAL.biological_process <- FILE.GOEA.HSJD.sel.CU.ORAL.biological_process %>%
  mutate(gene_rate = count / total_counts)

mean_EA <- FILE.GOEA.HSJD.sel.CU.ORAL.biological_process %>%
  group_by(GROUP.rec,first_ancestor_name) %>%
  summarise(mean_EA_VALUE = mean(EA_VALUE, na.rm = TRUE))



# Merge con el dataset original
FILE.GOEA.HSJD.sel.CU.ORAL.biological_process <- left_join(FILE.GOEA.HSJD.sel.CU.ORAL.biological_process, mean_EA, by = c("GROUP.rec", "first_ancestor_name"))

ggballoonplot((FILE.GOEA.HSJD.sel.CU.ORAL.biological_process), x = "GOEA_up_DOWN", y = "first_ancestor_name", size = "gene_rate",
              fill = "mean_EA_VALUE", facet.by = "GROUP.rec",
              ggtheme = theme_bw()) +  scale_fill_viridis_c(option = "C") +  labs( y= "GOEA ontologies MUESTRAL ORALES",
                                                                                   x= "UP/DOWN",
                                                                                   title="GOEA ontologies - CU- MUESTRAS ORALES biological_process")+
  # Change the appearance of titles and labels
 font("title", size = 8, color = "red", face = "bold.italic")+
 font("subtitle", size = 10, color = "orange")+
 font("caption", size = 5, color = "orange")+
 font("xlab", size = 5, color = "blue")+
 font("ylab", size = 10, color = "red", face = "bold")+
 font("xy.text", size = 2, color = "#993333", face = "bold") +

# Change the appearance of legend title and texts
 font("legend.title", color = "blue", face = "bold")+
 font("legend.text", color = "red")

#aaa<-FILE.GOEA.HSJD.sel.CU.ORAL.biological_process %>%  group_by(GROUP, GOEA_up_DOWN,ONTOLOGY) %>%  summarize(Mean = mean(EA_VALUE, na.rm=TRUE) ) %>%
#  arrange(GROUP, GOEA_up_DOWN, desc(Mean))
#write.table(aaa, file = "FILE.GOEA.HSJD.sel.CU.ORAL.biological_process.txt", sep = "\t",            row.names = FALSE)